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Abstract 

We give near-tight lower bounds for the sparsity required in several dimensionality reducing 
linear maps. First, consider the Johnson-Lindenstrauss (JL) lemma which states that for any set 
of n vectors in M. d there is a matrix A £ ^ mxd with m = 0(e~ 2 logn) such that mapping by A 
preserves pairwise Euclidean distances of these n vectors up to a lie factor. Wc show that there 
exists a set of n vectors such that any such matrix A with at most s non-zero entries per column 
must have s = 17(e _1 logn/ log(l/e)) as long as m < 0{n/ log(l/e)). This bound improves the 
lower bound of Q(min{e~ 2 , e" 1 ^/log m d}) by [Dasgupta-Kumar-Sarlos, STOC 2010], which only 
held against the stronger property of distributional JL, and only against a certain restricted 
class of distributions. Meanwhile our lower bound is against the JL lemma itself, with no 
restrictions. Our lower bound matches the sparse Johnson-Lindenstrauss upper bound of [Kane- 
Nelson, SODA 2012] up to an 0(log(l/e)) factor. 

Next, we show that any mxn matrix with the fc-restrictcd isometry property (RIP) with con- 
stant distortion must have at least f2(fclog(n/fc)) non-zeroes per column if m = 0(klog(n/k)), 
the optimal number of rows of RIP matrices, and k < nj polylogn. This improves the previ- 
ous lower bound of 0(min{fc, n/m}) by [Chandar, 2010] and shows that for virtually all k it is 
impossible to have a sparse RIP matrix with an optimal number of rows. 

Both lower bounds above also offer a tradeoff between sparsity and the number of rows. 

Lastly, we show that any oblivious distribution over subspace embedding matrices with 1 
non-zero per column and preserving distances in a d dimcnsional-subspace up to a constant 
factor must have at least fl (d 2 ) rows. This matches one of the upper bounds in [Nelson- Nguyen, 
2012] and shows the impossibility of obtaining the best of both of constructions in that work, 
namely 1 non-zero per column and O(d) rows. 

1 Introduction 



The last decade has witnessed a burgeoning interest in algorithms for large-scale data. A common 
feature in many of these works is the exploitation of data sparsity to achieve algorithmic efficiency, 
for example to have running times proportional to the actual complexity of the data rather than 
the dimension of the ambient space it lives in. This approach has found applications in compressed 
sensing |CTn5IDon06j . dimension reduction |B()R10IDKS10IKN10IKN12lWDL + 09| . and numerical 
linear algebra [CW12, MM12, MP12| JNN12] . Given the success of these algorithms, it is important 
to understand their limitations. Until now, for most of these problems it is not known how far one 
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can reduce the running time on sparse inputs. In this work we make a step towards understanding 
the performance of algorithms for sparse data and show several tight lower bounds. 

In this work we provide three main contributions. We give near-optimal or optimal sparsity 
lower bounds for Johnson-Lindenstrauss transforms, matrices satisfying the restricted isometry 
property for use in compressed sensing, and subspace embeddings used in numerical linear algebra. 
These three contributions are discussed in Section 11.11 Section 11.21 an d Section 11.31 respectively. 

1.1 Johnson-Lindenstrauss 

The following lemma, due to Johnson and Lindenstrauss | JL84] . has been used widely in many 
areas of computer science to reduce data dimension. 

Theorem 1 (Johnson-Lindenstrauss (JL) lemma [JL84] ) . For any < e < 1/2 and any xi, . . . ,x n 

in M d , there exists A E K mxd with m = 0(e~ 2 log re) such that for all i,j G [nfj], 

\\Axi - Axj\\ 2 = (1 ± e)||a;i - Xj\\ 2 . 

Typically one uses the lemma in algorithm design by mapping some instance of a high-dimensional 
computational geometry problem to a lower dimension. The running time to solve the instance then 
becomes the time needed for the lower-dimensional problem, plus the time to perform the matrix- 
vector multiplications Axi\ see |Ind0ipVem04| for further discussion. This latter step highlights the 
importance of having a JL matrix supporting fast matrix- vector multiplication. The original proofs 
of the JL lemma took A to be a random dense matrix, e.g. with i.i.d. Gaussian, Rademacher, or 
even subgaussian entries |Ach03llAV0^IDG03l[FlMllMM[JL8llMat08j . The time to compute Ax 
then becomes 0(m ■ ||x||o), where x has ||x||o < d non-zero entries. 

A beautiful work of Ailon and Chazelle |AC09] described a construction of a JL matrix A 
supporting matrix-vector multiplication in time 0(dlog d + m 3 ), also with m = 0(e~ 2 logre). This 
was improved to 0(dlog d + m 2+1 ) |AL09| with the same m for any constant 7 > 0, or to O(dlogd) 
with m = O^-^ognlog 4 ^) [AI^lKWTl] . Thus if e" 2 log n <C \fd one can obtain nearly-linear 
0(d log d) embedding time with the same target dimension m as the original JL lemma, or one can 
also obtain nearly-linear time for any setting of e, n by increasing m slightly by polylogd factors. 

While the previous paragraph may seem to present the end of the story, in fact note that the 
"nearly-linear" 0{d\ogd) embedding time is actually much worse than the original 0(m- ||x||o) time 
of dense JL matrices when ||x||o is very small, i.e. when x is sparse. Indeed, in several applications 
we expect x to be sparse. Consider the bag of words model in information retrieval: in for example 
an email spam collaborative filtering system for Yahoo! Mail |WDL + 09| . each email is treated as a 
(i-dimensional vector where d is the size of the lexicon. The ith. entry of the vector is some weighted 
count of the number of occurrences of word i (frequent words like "the" should be weighted less 
heavily). A machine learning algorithm is employed to learn a spam classifier, which involves dot 
products of email vectors with some learned classifier vector, and JL dimensionality reduction is 
used to speed up the repeated dot products that are computed during training. Note that in 
this scenario we expect x to be sparse since most emails do not contain nearly every word in the 
lexicon. An even starker scenario is the turnstile streaming model, where the vectors x may receive 
coordinate-wise updates in a data stream. In this case maintaining Ax in a stream given some 
update of the form "add v to x" requires adding vAei to the compression Ax stored in memory. 
Since ||ej|| = 1, we would not like to spend 0{d\ogd) per streaming update. 

1 Here and throughout this paper, [n] denotes the set {1, . . . , 71}. 
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The intuition behind all the works |AC09llAKMlALllllKWTT] to obtain O(dlogd) embedding 
time was as follows. Picking A to be a scaled sampling matrix (where each row has a 1 in a random 
location) gives the correct expectation for 1 1 ^4_as 1 1 § , but the variance may be too high. Indeed, the 
variance is high exactly when x is sparse; consider the extreme case where \\x\\o = 1 so that sampling 
is not even expected to see the non-zero coordinate unless m > d. These works then all essentially 
proceed by randomly preconditioning x to ensure that x is very well-spread (i.e. far from sparse) 
with high probability, so that sampling works, and thus fundamentally cannot take advantage of 
input sparsity. One way of obtaining faster matrix-vector multiplication for sparse inputs is to 
have sparse JL matrices A. Indeed, if A has at most s non-zero entries per column then Ax can 
be computed in 0(s ■ \\x\\ + m) time. A line of work |Acfa031IMat08llDKSinilBOR.10|IKN101IKN12| 
investigated the value s achievable in a JL matrix, culminating in |KN12| showing that it is possible 
to simultaneously have m = 0(e~ 2 logn) and s = 0(e~ 1 logn). Such a sparse JL transform thus 
speeds up embeddings by a factor of roughly 1/e without increasing the target dimension. 

Our Contribution I: We show that for any n > 2 and any e = 0(1 /\fn), there exists a set of 
n vectors x±,...,x n £ W 1 such that any JL matrix for this set of vectors with m = 0(e~ 2 logn) 
rows requires column sparsity s = 0(e^ 1 log n/ log(l/e)) as long as m = 0(n/ log(l/e)). Thus the 
sparse JL transforms of |KN12j achieve optimal sparsity up to an O (log (1/e)) factor. In fact this 
lower bound on s continues to hold even if m = 0(e~ c logn) for any positive constant c. 

Note that if m = n one can simply take A to be the identity matrix which achieves s = 1, 
and thus the restriction m = 0(n/ log (1/e)) is nearly optimal. Also note that we can assume 
e = 0(1/ y/n) since otherwise m = 0(n) is required in any JL matrix |Alo09| . and thus the m = 
0(nj log (1/e)) restriction is no worse than requiring m = 0(n/ logn). Furthermore if all the entries 
of A are required to be equal in magnitude, our lower bound holds as long as m < n/10. 

Before our work, only a restricted lower bound of s = J7(min{l/e 2 , e _1 y / log m d}) had been 
shown [ DKS1U| . In fact this lower bound only applied to the distributional JL problem, a much 
stronger guarantee where one wants to design a distribution over m x d matrices such that any 
fixed vector x has ||Ae||2 = (1 db e)||x||2 with probability 1 — 5 over the choice of A. Indeed any 
distributional JL construction yields the JL lemma by setting 5 = 1/n 2 and union bounding over all 
the Xi — Xj difference vectors. Thus, aside from the weaker lower bound on s, [DKS10] only provided 
a lower bound against this stronger guarantee, and furthermore only for a certain restricted class 
of distributions that made certain independence assumptions amongst matrix entries, and also 
assumed certain bounds on the sum of fourth moments of matrix entries in each row. 

It was shown by Alon |Alo09] that m = 0(s~ 2 logn/ log (1/e)) is required for the set of points 
{0, ei, . . . , e n } and d = n as long as 1/e 2 < n/2. Here is the ith standard basis vector. Simple 
manipulations show that, when appropriately scaled, any JL matrix A for this set of vectors is 
O(e) -incoherent, in the sense that all its columns v\, . . . ,v n have unit £2 norm and the dot products 
{vi,Vj) between pairs of columns are all at most 0(e) in magnitude. We study this exact same 
hard input to the JL lemma; what we show is that any such matrix A must have column sparsity 
s = 0(e~ 1 logn/log(l/e)). 

In some sense our lower bound can be viewed as a generalization of the Singleton bound for 
error-correcting codes in a certain parameter regime. The Singleton bound states that for any 
set of n codewords with block length t, alphabet size q, and relative distance r, it must be that 
n < q t ~ r+1 . If the code has relative distance 1 — e then t — r < et, so that if t > 1/e the Singleton 
bound implies t = ^(e" 1 log nj log q). The connection to incoherent matrices (and thus the JL 
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lemma), observed in |Alo09| . is the following. For any such code {C\, . . . ,C n }, form a matrix 
A € jj mxn with m = qt. The rows are partitioned into t chunks each of size q. In the ith column 
of A, in the jth chunk we put a \j\ft in the row of that chunk corresponding to the symbol (Ci)j, 
and we put zeroes everywhere else in that column. All columns then have £2 norm 1, and the 
code having relative distance 1 — e implies that all pairs of columns have dot products at most e. 
The Singleton bound thus implies that any incoherent matrix formed from codes in this way has 
t = log nj log q). Note the column sparsity of A is t, and thus this matches our lower bound 

for q < poly(l/e). Our sparsity lower bound thus recovers this Singleton-like bound, without the 
requirement that the matrix takes this special structure of being formed from a code in the manner 
described above. One reason this is perhaps surprising is that incoherent matrices from codes have 
all nonnegative entries; our lower bound thus implies that the use of negative entries cannot be 
exploited to obtain sparser incoherent matrices. 

1.2 Compressed sensing and the restricted isometry property 

Another object of interest are matrices satisfying the restricted isometry property (RIP). Such 
matrices are widely used in compressed sensing. 

Definition 2 ( [CT05tlCRT06b[ICan08| ) . For any integer k > 0, a matrix A is said to have the 
k-restricted isometry property with distortion 5^ if (1 — ^)||a;||| < ||Ac||| < (1 + <5fc)ll x ll2 for all x 
with ||x||o < k. 

The goal of the area of compressed sensing is to take few nonadaptive linear measurements 
of a vector x G M. n to allow for later recovery from those measurements. That is to say, if those 
measurements are organized as the rows of some matrix A € M mxri , we would like to recover x from 
Ax. Furthermore, we would like do so with m <C n so that Ax is a compressed representation of 
x. Of course if m < n we cannot recover all vectors x S M n with any meaningful guarantee, since 
then A will have a non-trivial kernel, and x,x + y are indistinguishable for y G ker(^4). Compressed 
sensing literature has typically focused on the case of x being sparse |CRT06 a, Don06j . in which 
case a recovery algorithm could hope to recover x by finding the sparsest x such that Ax, = Ax. 

The works |Can08llCTaW^ICT05| show that if A satisfies the 2/c-RIP with distortion 8 k < 
\/2 — 1, and if x is fc-sparse, then given Ax there is a polynomial-time solvable linear program to 
recover x. In fact for any x, not necessarily sparse, the linear program recovers a vector x satisfying 

\\x — x\\2 < 0(1 /vie) ■ inf \\x — z\\i, 

\\z\\o<k 

known as the £2/^1 guarantee. That is, the recovery error depends on the i\ norm of the best 
/c-sparse approximation z to x. 

It is known [BIPW 10 . GG84 . Kas77] that any matrix A allowing for the ^2/^1 guarantee simulta- 
neously for all vectors x, and thus RIP matrices, must have m = Q(klog(n/k)) rows. For complete- 
ness we give a proof of the new stronger lower bound m = 0(log~ 1 (l/<5fc)(<5 A r 1 fe log(n/fc) + S^ 2 k)) 
in Section \5\ though we remark here that current uses of RIP all take 5k = 0(1)- 

Although the recovery x, of x can be found in polynomial time as mentioned above, this poly- 
nomial is quite large as the algorithm involves solving a linear program with n variables and m 
constraints. This downside has led researchers to design alternative measurement and/or recov- 
ery schemes which allow for much faster sparse recovery, sometimes even at the cost of obtaining 
a recovery guarantee weaker than £2/^1 recovery for the sake of algorithmic performance. Many 
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of these schemes are iterative, such as CoSaMP |NT09| . Expander Matching Pursuit |IR08] . and 
several others pn^lBIR,n8llBDn81IDTDlS121IFoullll(^Kn91INVn91INVini[TGn7] . and several of their 
running times depend on the product of the number of iterations and the time required to multiply 
by A or A* (here A* denotes the conjugate transpose of ^4). Several of these algorithms furthermore 
apply A, A* to vectors which are themselves sparse. Thus, recovery time is improved significantly 
in the case that A is sparse. Previously the only known lower bound for column sparsity s for an 
RIP matrix with an optimal m = Q(klog(n/k)) number of rows was s = f}(min{/c, n/m}) [ChalOj . 
Note that if an RIP construction existed matching the [ChalOj column sparsity lower bound, ap- 
plication to a /c-sparse vector would take time 0(min{& 2 , nk/m}), which is always o{n) and can 
be very fast for small k. Furthermore, in several applications of compressed sensing m is very 
close to n, in which case an 0(n/m) lower bound on column sparsity does not rule out very sparse 
RIP matrices. For example, in applications of compressed sensing to magnetic resonance imag- 
ing, |LDP07| recommended setting the number of measurements m to be between 5-10% of n to 
obtain good performance for recovery of brain and angiogram images. We remark that one could 
also obtain speedup by using structured RIP matrices, such as those obtained by sampling rows of 
the discrete Fourier matrix [CT06| . though such constructions require matrix- vector multiplication 
time ©(nlogn) independent of input sparsity. 

Another upside of sparse RIP matrices is that they allow faster algorithms for encoding x \— > Ax. 
If A has s non-zeroes per column and x receives, for example, turnstile streaming updates, then 
the compression Ax can be maintained on the fly in O(s) time per update (assuming the non-zero 
entries of any column of A can be recovered in O(s) time). 

Our Contribution II: We show as long as k < ra/polylog??,, any /c-RIP matrix with distortion 
O(l) and m = 0(/c log(n//c)) rows with s non-zero entries per column must have s = f2(/c log(n//c)). 
That is, RIP matrices with the optimal number of rows must be dense for almost the full range 
of k up to n. This lower bound strongly rules out any hope for faster recovery and compression 
algorithms for compressed sensing by using sparse RIP matrices as mentioned above. 

We note that any sparsity lower bound should fail as k approaches n since the n x n identity 
matrix trivially satisfies /c-RIP for any k and has column sparsity 1. Thus, our lower bound holds 
for almost the full range of parameters for k. 

1.3 Oblivious Subspace Embeddings 

The last problem we consider is the oblivious subspace embedding (OSE) problem. Here one aims 
to design a distribution T> over mxn matrices A such that for any d-dimensional subspace W C M. n , 

Pa~o(Vx G W \\Ax\\ 2 G (1 ±e)|M| 2 ) > 2/3. 

Sarlos showed in |Sar06j that OSE's are useful for approximate least squares regression and low 
rank approximation, and they have also been shown useful for approximating statistical leverage 
scores |DMIMW12j . an important concept in statistics and machine learning. See |CW12j for an 
overview of several applications of OSE's. 

To give more details of how OSE's are typically used, consider the example of solving an 
overconstrained least-squares regression problem, where one must compute argmin x \\Sx — b\\2 for 
some S G M nx ' i . By overconstrained we mean n > d, and really one should imagine n 3> d in what 
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follows. There is a closed form solution for the minimizing vector x, which requires computing the 
Moore-Penrose pseudoinverse of S. The total running time is C^ne?^ -1 ), where to is the exponent 
of square matrix multiplication. 

Now suppose we are only interested in finding some x so that 

\\Sx — b\\2 < (1 + e) • argmin^ \\Sx — b\\2- 

Then it suffices to have a matrix A such that ||^4^||2 = (1 ± 0(e))||.z||2 for all z in the subspace 
spanned by b and the columns of A, in which case we could obtain such an x by solving the new least 
squares regression problem of computing argmin £ || ASx — >16 1 1 2 - If A has m rows, the new running 
time is the sum of three terms: (1) the time to compute Ab, (2) the time to compute AS, and (3) 
the 0{md L1 ~ 1 ) time required to solve the new least-squares problem. It turns out it is possible to 
obtain such an A with m = 0(d/e 2 ) by choosing, for example, a matrix with independent Gaussian 
entries (see e.g. |Gor8 8,KM05]). but then computing AS takes time f^ncf^ -1 ), providing no benefit. 

The work of Sarlos picked A with special structure so that AS can be computed in time 
0(nd log n), namely by using the Fast Johnson-Lindenstrauss Transform of |AC09] (see also [Troll] ). 
Unfortunately the time is 0(nd log n) even for sparse matrices S, and several applications require 
solving numerical linear algebra problems on sparse matrix inputs. For example in the Netflix 
matrix where rows are users and columns are movies, and Sij is some rating score, S is very sparse 
since most users rate only a tiny fraction of all movies [ZWSP08]. If nnz(5) denotes the number of 
non-zero entries of S, we would like running times closer to 0(nnz(5)) than 0(nd\ogn) to multiply 
A by S. Such a running time would be possible, for example, if A only had s = 0(1) non-zero 
entries per column. 

In a recent and surprising work, Clarkson and Woodruff |CW12j gave an OSE with m = 
poly(d/e) and s = 1, thus providing fast numerical linear algebra algorithms for sparse matrices. 
For example, the running time for least-squares regression becomes 0(nnz(^4) + poly(d/e)). The 
dependence on d,e was improved in |NN12j to m = 0(d 2 /e 2 ). The work |NN12j also showed how 
to obtain m = 0(ff 1+7 /e 2 ), s = 0(l/s) for any constant 7 > (the constant in the big-Oh depends 
polynomially on 1/7), or m = (d polylog d) je 2 , s = (polylogd)/e. It is thus natural to ask whether 
one can obtain the best of both worlds: can there be an OSE with m ~ d/e 2 and s = 1? 

Our Contribution III: In this work we show that any OSE such that all matrices in its support 
have m rows and s = 1 non-zero entries per column must have m = Q(d 2 ) if n > 2d 2 . Thus for 
constant e and large n, the upper bound of |NN12j is optimal. 

1.4 Organization 

In Section [2] we prove our lower bound for the sparsity required in JL matrices. In Section [3] we 
give our sparsity lower bound for RIP matrices, and in Section H] we give our lower bound on the 
number of rows for OSE's having sparsity 1. In Section [5] we give a lower bound involving 5^ on 
the number of rows in an RIP matrix, and in Section [6] we state an open problem. 

2 JL Sparsity Lower Bound 

Define an e-incoherent matrix A G j^ mxn as an y ma trix whose columns have unit £2 norm, and 
such that every pair of columns has dot product at most e in magnitude. A simple observation 
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of [Alo09J is that any JL matrix A for the set of vectors {0, e\, . . . , e n } 6 R n , when its columns are 
scaled by their ti norms, must be 0(e)-incoherent. 

In this section, we consider an e-incoherent matrix A S ]R mxn with at most s non-zero entries 
per column. We show a lower bound on s in terms of e,n,m. In particular if m = 0(e~ 2 logn) is 
the number of rows guaranteed by the JL lemma, we show that s = f2(e _1 logn/ log(l/e)) as long 
as m < n/ polylog n. In fact if all the entries in A are either or equal in magnitude, we show that 
the lower bound even holds up to m < n/10. 

In Section[2j]we give the lower bound on s in the case that all entries in A are in {0, l/y/s, — l/y/s}. 
In Section 12.21 we give our lower bound without making any assumption on the magnitudes of en- 
tries in A. Before proceeding further, we prove a couple lemmas used throughout this section, and 
also later in this paper. Throughout this section A is always an e-incoherent matrix. 

Lemma 3. For any x >2e, A cannot have any row with at least 5/x entries greater than y/x, nor 
can it have any row with at least 1/x entries less than —y/x. 

Proof. For the sake of contradiction, suppose A did have such a row, say the jth row. Suppose 
Aj^ , . . . , Aj t i N > \/x for some x > 2e, where N > 5/x (the case where they are each less than — \/x 
is argued identically). Let V{ denote the ith column of A. Let u, be V{ but with the jth coordinate 
replaced with 0. Then for any k\, kt2 £ [N] 

{u iki , u ik2 ) < (v iki , v ik2 )-x<e-x< -x/2. 

Thus we have 



< 



v 

3=1 



E 



< N -xN(N - l)/4, 



2 

and rearranging gives the contradiction 1/x > (N — l)/4 > ■ 

Lemma 4. Let s,q,r be positive reals with q/r > 2 and s < q/e. Then if s\n(q/s) > r it must be 
the case that s = Q(r/\n(q/r)). 

Proof. Define the function /(s) = s\n(q/s). Then f'(s) = ln(g/(es)) is increasing for s < q/e. 
Then since q/r > 2, for s = cr\x\.{q/r) for constant c > we have the equality s1n(q/s) = 
cr / ln(g/r) ln((g/r) ln(g/r)) = (c+o q i r {l))r ln(g/r), where the o q / r (l) term goes to zero as q/r — > oo. 
Thus for c sufficiently small we have that the c + o q / r {\) term must be less than 1, so in order to 
have f(s) > r, since / is increasing we must have s = 0(r/ln(g/r)). ■ 

2.1 Sign matrices 

In this section we consider the case that all entries of A are either or ±l/y/s and show a lower 
bound on s in this case. 

Lemma 5. Suppose m < n/10 and all entries of A are in {0, l/y/s, — l/y/s}. Then s > l/(2e). 

Proof. For the sake of contradiction suppose s < l/(2e). There are ns non-zero entries in A and 
thus at least ns/2 of these entries have the same sign by the pigeonhole principle; wlog let us say 
l/y/s appears at least ns/2 times. Then again by pigeonhole some row j of A has N = ns/(2m) 
values that are 1 / y/s. The claim now follows by Lemma [3] with x = 1/ \/s. ■ 

We now show how to improve the bound to the desired form. 
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Theorem 6. Suppose m < n/10 and all entries of A are in {0, 1/y/s, — 1/y/s}. Then s > 
^(e^ 1 log nj log (m/ log n)). 

Proof. We know s > l/(2e) by Lemma[5j Let t = 2es > 1. Every vi has (?) subsets of size t of non- 
zero coordinates. Thus by pigeonhole there exists a set of t rows and N = n (?) /(2* ('?) ) 
columns u,^ , . . . , f ^ such that for each row all entries in those columns are 1 / y/s in magnitude and 
have the same sign (the signs may vary across rows). Letting Uj be Vj but with those t coordinates 
set to 0, we have 

( u jk^ u h 2 ) = (vjk^VhJ ~t/s<s- t/s < -t/(2s). 

Thus we have 

N 



o < 

so that rearranging gives 



k=l 



< N — tN(N - l)/(4s) 

2 



s>t(N- l)/4 = (t/A) ■ - lj > (t/4) • (n{s/(2em)) t - 1). 

Suppose s < ce~ l logn/ log(2em/n) for some small constant c so that n(s / (2em)) t > 2. Then 

s > (in/8) • (s/(2em)) i . 

Thus 

en tn ( 2em\ 

t ~ 8s - \~~ir) 

Taking the natural logarithm of both sides gives 

, / 2em\ 1 , /en\ 

s] "{— ) £ 2? ln (T)' 

Define q = 2em, r = e _1 ln(en/4)/2. Then s < g/e, since s < m. By |Alo09j we must have 
m = 0(e -2 logn/log(l/e)), so q/r > 2 for e smaller than some fixed constant. Thus by Lemma H] 
we have s = 0(r/ \n(q/r)). The theorem follows since log(em/logn) = 0(m/logn) since m = 
O(^ 2 logn/log(l/e)) |AloQ9| . ■ 

Corollary 7. Suppose m < poly(l/e) • logn < n/10 and all entries of A are in {0, 1/y/s, —1/y/s}. 
Then s > Q(e~ l logn/ log(l/e)). 

2.2 General matrices 

We now consider arbitrary sparse and nearly orthogonal matrices A G M mxn . That is, we no longer 
require the non-zero entries of A to be 1/y/s in magnitude. 

Lemma 8. Suppose m < n/(201n(l/2e)). Then s > l/(4e). 
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Proof. For the sake of contradiction suppose s < l/(4e). We know by Lemma [3] that for any 
x > 2e, no row of A can have more than 5/x entries of value at least ^fx in magnitude and of the 
same sign. Define Si = {j : A\- > 2e}. Let be the subset of indices j in Si with Aij > 0, and 
define S~ = Si\S^~. Let X denote the square of a random positive value from S^~. Then 

A* j = \S+\-EX = \S+\- [ P(X > x)dx < 2s\Sf\ + [ -dx = 2e\Sf\ + 51n(l/2e). 

By analogously bounding the sum of squares of entries in S^ , we have that the sum of squares of 
entries at least \[2e in magnitude is never more than 2e|S'j| + 10 ln(l/2e) in the ith row of A, for any 
i. Thus the total sum of squares of all entries in the matrix less than \[2e in magnitude is at most 
2e(ns — ^2 i \Si\). Meanwhile the sum of all other entries is at most 2e(^2 i |5j|) + 10mln(l/2e). Thus 
the sum of squares of all entries in the matrix is at most 2ens + 10mln(l/2e) < n/2 + 10m ln(l/2e), 
by our assumption on s. This quantity must be n, since every column of A has unit £2 norm. 
However for our stated value of m this is impossible since 10mln(l/2e) < n/2, a contradiction. ■ 

We now show how to obtain the extra factor of logn/log(l/e) in the lower bound. 

Lemma 9. Let < e < 1/2. Suppose vi,...,v n G W 71 each have \\v\\2 = 1 and \\v\\q < s, 
and furthermore \(vi,Vj)\ < e for i 7^ j. Then for any t G [s] with t/s > Ce, we must have 
s > t(N- 1)/(2C) with 



N 



n 



2 *(?)ft +t) ) 



C = 2/(1 - 1/V2). 



Proof. We label each vector Vi by its t-type, defined in the following way. The i-type of a vector 
v i is the set of locations of the t largest coordinates in magnitude, as well as the signs of those 
coordinates, together with a rounding of those top t coordinates so that their squares round to 
the nearest integer multiple of l/(2s). In the rounding, values halfway between two multiples are 
rounded arbitrarily; say downward, to be concrete. Note that the amount of £2 mass contained in 
the top t coordinates of any Vi after such a rounding is at most 1 + t/(2s), and thus the number 
of roundings possible is at most the number of ways to write a positive integer in [2s + t] as a 
sum of t positive integers, which is ( 2s ^ 2< ) • Thus the total number of possible i-types is at most 

2* (T) ( 2 ^t + ^) ((T) cn °i ces of the largest t coordinates, 2* choices of their signs, and ( 2 ^ + *' ) ) choices 
for how they round). Thus by the pigeonhole principle, there exist iV vectors . . . , v i N each with 

the same t-type such that N > n/(2* (™) ( 2(s 4 + * } )) . 

Now for these vectors Vi x , . . . , v% N , let S C [n] of size t be the set of the largest coordinates (in 
magnitude) in each v%.. Define = (^)[ n ]\sj that is, we zero out the coordinates in S. Then for 



(u i:j ,Ui k ) = (Vi.,v ik ) - ^(^-)r(%)r 
res 

<£-^K)r(K)r±l/v^) 
reS 

< £ -E(k)'-ik)-i/^ 

reS 
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< e 

< e 



K-)slli + V*/(2«)- ||K-)s| 



1 



75 1 f/s ' 



(1) 



-Ct/s. The penultimate 



The last inequality used that ||(i>i )s||2 > y/t/s. Also we pick i to ensure 

t/fl > 2e/(l - l/\/2) 

so that the right hand side of Eq. (TTJ) is less than —((1 — l/V2)/2)t/s 
inequality follows by Cauchy-Schwarz. Thus we have 

2 N 

|2 



N 



£< 



12 + AJ^i 



= ^ IM 

< N -C(t/s)N(N - l)/2 (2) 
However we also have || £^ • ||| > 0, which implies s > C(iV — l)£/2 by rearranging Eq. ([2]). ■ 

Theorem 10. There is some fixed < Eq < 1/2 so that the following holds. Let < e < £q. 

Suppose vi,...,v n G M. m each have \\v\\2 = 1 and \\v\\o < s, and furthermore \{vi,Vj)\ < e for i ^ j. 
Then s > r2(e _1 log n/ log (m/ log n)) as long as m < 0{nj ln(l/e)). 

Proof. By Lemma [SJ Aes > 1. Set t = 7es so that Lemma applies. Then by Lemma EJ as long 

as 2*(7)( 2 ( s + i )) <n/2, 



7en 



tn 



< 4C ■ 2* 



2{s + t\ 
t 

7es 



where C is as in Lemma EE Taking the natural logarithm on both sides, 

/ 8e 2 m' 



In other words, 



ln(7en/(4C)) < (7es) In 
ln(7en/(4C)) 



s > 



7eln 



8e 2 r 



Define r = ln(7en/(4C))/(7e),g = 8e 2 m/(49e 2 ). Thus we have sln(g/s) > r. We have that s < q/e 
is always the case for e < 1/2 since then q/e >m and we have that s <m. Also note for e smaller 
than some constant we have that q/r > 2 since m = f2(logn) by [Alo09] . Thus by Lemma H we 
have s > 0(r/ln(g/r)). Using that ln(en) = 9(logn) since e > 1/y/n, and that 2 t (™) ( 2(s t + ' } ) < 
(8e 2 m/(49e 2 s)) < n/2 for our setting of t when s = o(e~ 1 log n/ log(?n/(e~ 1 log n))) gives s = 
log n/ log(e~ 1 m/ log n)). Since m = £l{e~ 2 log n/ log(l/e)) |Alo09j . this is equivalent to our 
lower bound in the theorem statement. ■ 

Corollary 11. Let e,m,s be as in Theorem \JU Then s = f2(e _1 log nj log(l/e)) as long as 
m < poly(l/e) • logn < 0(n/ ln(l/e)). 

Remark 12. From Theorem 1 101 we can deduce that for constant e, in order for the sparsity s to 
be a constant independent of n, it must be the case that m = n n ^\ This fact rules out very sparse 
mappings even when we significantly increase the target dimension. 
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3 RIP Sparsity Lower Bound 



Consider a /c-RIP matrix A £ ^ mxn with distortion 5k where each column has at most s non-zero 
entries. We will show for 5k = ©(1) that s cannot be very small when m has the optimal number 
of rows 0(/c log(n/fc)). 

Theorem 13. Assume k > 2, 5k < 5 for some fixed universal small constant 5 > 0, m < 
n/(641og 3 n). Then we must have s = Q(mm{k\og(n/k)/\og(m/(k\og(n/k))),m}). 

Proof. Assume for the sake of contradiction that s < min{fclog(n/fc)/(641og(m/s)),m/64}. Con- 
sider the ith. column of A for some fixed i. By fc-RIP, the £2 norm of each column of A is at least 
1 — 5k > 1/2, so the sum of squares of entries greater than l/(2yfs) in magnitude is at least 1/4. 
Therefore, there exists a scale 1 < t < logs such that the number of entries of absolute value 
greater than or equal to 2^~ 3 ^ 2 / y/s is at least 2~ t ~ 1 s/t 2 . To see this, let \S\ be the set of rows j 
such that \Aji\ > l/(2y/s). For the sake of contradiction, suppose that every scale 1 < t < logs 
has strictly fewer than 2~ t ~ l s/t 2 values that are at least 2^/ 2 /Jl in magnitude (note this also 
implies \S\ < s/4). Let X be the square of a random element of 5. Then 

Y,Ai i = \S\-EX = \S\.j o F(X>x)dx<- + J J(X>x)dx<- + ^^ s - ¥ n^ < Z , 

j£S 1 t=i 

a contradiction. Let a pattern at scale t be a subset of size u = max{2 4_ *s/fc, 1} of [m] along with 
u signs. There are ( 2 u s ^ t ) patterns P where A 2 i > 2*~ 3 /s for all v £ P and the signs of A V) i 
match the signs of P. 

There are 2 lt (™) possible patterns at scale t. By an averaging argument, there exists a scale 
t, and a pattern P such that the number of columns of A with this pattern is at least z = 



n 



2~ t ~ 1 s/t' 



)/((logs)2"(™)). Consider 2 cases. 



Case 1 (z > k): Pick an arbitrary set of k such columns. Consider the vector v with k ones at 
locations corresponding to those columns and zeroes everywhere else. We have \\v\\ 2 = k and for 
each j G P, we have 

(Av) 2 > k 2 2 l ~ z /s. 

Thus, 

\\Av\\ 2 2 > uk 2 2 t ~ 3 /s > 2k. 
This contradicts the assumption that H-A^H! < (1 + ^jfc))!^ 



Case 2 (z < k): Consider the vector v with z ones at locations corresponding to those columns 

) 2 



and zeroes everywhere else. We have ||t> ||| = z and for each j £ P, we have (Av) 2 > z 2 2 t 3 /s. 
Consider 2 subcases. 



Case 2.1 (u = 1): Then z = ^ so 

v ' (log s)m ' 



Av\\l > z 2 2 t ~ 3 /s > 2 U { t ■ z > 2z. (3) 



(log s)m 

This contradicts the assumption that ||.Au||2 < (1 + 
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Case 2.2 (u = 2 4 ~ t s/k): . We have 



: (log S )2«(™) 



n 

> 



> ^_2-( lo g( m / s )+ lo g e+t+2+2 log t)2 4 -*s/fc 

"logs 

> ^^2~( lo §( m / s ) +lo s e+t+2+21o §*) 24 ~*' lo s( n / A: )/( 641o s( m / s )) (4) 
"logs 

>7^/n) 1/4 (5) 
log s 

>k. (6) 

Eq. ([U follows from s < fclog(n/fc)/(641og(m/s)). Eq. ([5]) follows from the fact that f(t) = 
(log(m/s) + log e + i + 2 + 2 log i)2~* is monotonically decreasing for t > 1. Indeed, 

f'{t) = 2"* (- In 2(log(m/s) + log e + 2 + t + 2 log t) + — + 1) 

< 2~* ( -91n2-thi2 + 



tin 2 



< 0. 

Eq. © follows since k < n/log 4 / 3 n < n/log 4//3 s, which holds since k < m < n/(641og 3 n). This 
contradicts the assumption of Case 2 that z < k. 

Thus we have s > min{fclog(n/A;)/(641og(m/s)),m/64} as desired. If s > m/64 we are done. 
Otherwise we have s > k log(ra/fe)/ (64 log (m/s)). Define q = m, r = fclog(n/&)/64. Thus we 
have slog(g/s) > r. We have g/r > 2 for 5^ smaller than some constant by Theorem I20| and we 
have s < g/e = m/e since we assume we are in the case s < m/64. Thus by Lemma H we have 
s = Cl(r/1n(q/r)), which completes the proof of the theorem. ■ 

Corollary 14. When k > 2, 5t < S for some universal constant 5 > 0, and the number of rows 
m = Q(klog(n/k)) < n/(32 log 3 n), we must have s = Cl(klog(n/k)). 

Remark 15. The restriction m = 0(n/log 3 n) in Theorem 1131 was relevant in Eq. ([3]). Note the 
choice of t 2 in the proof was just so that Y^t ^/^ 2 converges. We could instead have chosen t 1+1 and 
obtained a qualitatively similar result, but with the slightly milder restriction m = 0(nj log 2 " 1 " 7 n), 
where 7 > can be chosen as an arbitrary constant. 



4 Oblivious Subspace Embedding Sparsity Lower Bound 

In this section, we show a lower bound on the dimension of very sparse OSE's. 

Theorem 16. Consider d at least a large enough constant and n > 2d 2 . Any OSE with matrices 
A in its support having m rows and at most 1 non-zero entry per column such that with probability 
at least 1/5, the lengths of all vectors in a fixed subspace of dimension d ofW 1 are preserved up to 
a factor 2, must have m > d 2 /214. 
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Proof. Assume for the sake of contradiction that m < d 2 /214. By Yao's minimax principle, we 
only need to show there exists a distribution over subspaces such that any fixed matrix A with 
column sparsity 1 and too few rows would fail to preserve lengths of vectors in the subspace with 
probability more than 4/5. 

Consider the uniform distribution over subspaces spanned by d standard basis vectors in M n : 
e^, &i a , ■ ■ . , ei d with i\, . . . , G {1, . . . , n}. Let a{i) be the row of the non-zero entry in column i of 
A and b(j) be the number of non-zeroes in row j. We say i collides with j if a{i) = a(j). Let the 
set of heavy rows be the set of rows j such that b{j) > -g^. 

If we pick i±, . . . ,id one by one. Conditioned on i±, . . . , it-i, the probability that a(it) is heavy 
is at least — ^ > |. Therefore, by a Chernoff bound, with probability at least 9/10, the number 
of indices it such that a{it) are heavy is at least 3d/4. 

We will show that conditioned on the number of such it being at least 3d/ '4, with probability at 
least 9/10, two such indices collide. Let ji, . . . ,^'3^/4 be indices with b(a(jt)) > Conditioned 
on a(ji), . . . , a(ji_i), the probability that jt does not collide with any previous index is at most 



l_J2b(a(j u ))/(n-t+l) + (t-l)/(n-t + l) < e~ ^=1 HaUu))/n+2(t-l)/n < e -(t-l)/(10m)+2(t-l)/»_ 



Thus, the probability that no collision occurs is at most e (-( 3d / 4 ) 2 /(40m))+((3d/4) 2 /n) < !/ 10 . i n 
other words, collision occurs with probability at least 9/10. When collision occurs, the number of 
non-zero entries of AJ\d , where Ad is the matrix whose columns are 6^ , . . . , e% d , is at most d — 1 so 
it has rank at most d—1. Therefore, with probability at least 4/5, A maps some non-zero vector in 
the subspace to the zero vector (any vector Mx for x £ ker(AM)) and fails to preserve the length 
of all vectors in the subspace. ■ 

5 Lower Bound on Number of Rows for RIP Matrices 

In this section we show a lower bound on the number of rows of any /c-RIP matrix with distortion 
5k- First we need the following form of the Chernoff bound. 

Theorem 17 (Chernoff bound). Let X±, . . . ,X n be independent random variables each at most K 
in magnitude almost surely, and with Y^!i=i = M an ^ Var [ Ya=i ] = o~ 2 . Then 



for some absolute constants c, C > 0. 

This form of the Chernoff bound can then be used to show the existence of a large error- 
correcting code with high relative distance. 

Lemma 18. For any < e < 1/2 and integers k,n with 1 < k < en/2, there exists a q-ary code 
with q = n/k and block length k of relative distance 1 — e, and with size at least 



u=l 



n 



V A > 0, Pr X i ~ M > A(T < C 

■ max 



L- cX \{XK/a)- cXrT / K ] 



i=i 




for some absolute constant C > 0. 
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Proof. We take a random code. That is, pick 

codewords with alphabet size q = n/k and block length k, with replacement. Now, look at two 
of these randomly chosen codewords. For i = 1, . . . , k, let Xi be an indicator random variable for 
the event that the ith symbol is equal in the two codewords. Then X = Yli=i ^-i * s the number of 
positions at which these two codewords agree, and EX = k 2 /n < ek/2 and Var [X] < k 2 /n. Thus 
by the Chernoff bound, 

P(|X| > ek) < C-max{e- C£2n ,e- cefclog (^)}. 

Therefore by a union bound, a random multiset of N codewords has relative distance 1 — e with 
positive probability (in which case it must also clearly be not just a multiset, but a set). ■ 

Before proving the main theorem of this section, we also need the following theorem of Alon 
[Alo09j . 

Theorem 19 (Alon |Alo09| ) . Let x%, . . . , xn G K n be such that \\xi\\2 = 1 f or a ^ h an d \ (xi,Xj)\ < s 
for all i / j, where l/y/n < e < 1/2. Then n = Q(e~ 2 log N/ log(l/e)). 

Theorem 20. For any < 5k < 1/2 and integers k,n with 1 < k < 5kn/2, any k-RIP matrix with 
distortion 5k must have $1 (min{n/ log(l/<5fc), {k/{5k log(l/5fc))) log(n/A;)}) rows. 

Proof. Let Ci, . . . , CV be a code as in Lemma [18] with block length n/(k/2) and alphabet size 
k/2 with 

AT ^ • / C& 2 h n C8 k k\og(^)\ 

Jy > mm < e k ,e v " ' > . 

Consider a set of vectors y±, . . . , yjy in ]R n defined as follows. For j = 0, . . . , k/2 — 1, we define 
(yi)2j n /k+(Ci)j = Wfc, and all other coordinates of yi are 0. Then we have Vi \\yiW2 = 1, and also 
< (yi,yj) < 5k for all i 7^ j, and thus 2 — 25k < \\y% — ^ 2. Since yi is A:/2-sparse and yi — yj 
is fc-sparse for all i,j, we have for any £;-RIP matrix A with distortion 5k 

Vi \\Ayi\\ 2 = l±5 k , Vi / j \\A yi - A yj \\ 2 2 = (1 ± 5 k f ■ (2 ± 25 k ) = 2 ± 95 k . 

Thus if we define x\,...,xn by Xi = Ayi/\\Ayi\\2, then the Xi satisfy the requirements of 
Theorem [19] with inner products at most 0(5k) in magnitude. The lower bound on the number of 
rows of A then follows. ■ 

It is also possible to obtain a lower bound on the number of rows of A in Theorem [20] of the 
form ^1(5^ k/ log(l/5k))- This is because a theorem of [KWllj shows that any such RIP matrix 
with k = 0(logra), when its column signs are flipped randomly, is a JL matrix for any set of 
n points with high probability. We then know from Theorem [19] that a JL matrix must have 
m = Q((5 A T 2 logn/log(l/5fc)) rows, which is fl(5^ 2 k/\og(l/5k))- 

Corollary 21. Suppose l/\/n < 5k < 1/2 and A G ]j mxn i s a k-RIP matrix with distortion 5k- 
Then m = (log -1 (1/4) • min{A;log(n/A;)/<5fc + k/5l,n}). 
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6 Future Directions 



For several applications the JL lemma is used as a black box to obtain dimensionality-reducing linear 
maps for other problems. For example, applying the JL lemma with distortion 0(5k) on a certain 
net with N = OQ ■ 0(l/5 k ) k vectors yields a fc-RIP matrix with distortion 5 k [BDDW08] . Note in 
this case, for constant 5 k , the number of rows one obtains is the optimal ©(log N) = Q(klog(n/k)). 
Applying the distributional JL lemma with distortion 0(e) to a certain net of size 2°^ yields 
an OSE with m = 0(d/e 2 ) rows to preserve ci-dimensional subspaces (see [CW121 Fact 10], based 
on [AHK06] ). 

Applying the JL lemma in this black-box way using the sparse JL matrices of |KN12j yields 
a factor-e improvement in sparsity over using a random dense JL construction, with for example 
random Gaussian entries. However, some examples have shown that it is possible to do much 
better by not using the JL lemma statement as a black box, but rather by analyzing the sparsity 
required from the constructions in |KN12j "from scratch" for the problem at hand. For example, 
the work |NN12j showed that one can have column sparsity 0(1/ e) with m = 0(d 1+7 /e 2 ) rows in 
an OSE for any 7 > 0, which is much better than the column sparsity 0(d/e) that is obtained by 
using the sparse JL theorem as a black box. 

We thus pose the following open problem in the realm of understanding sparse embedding 
matrices better. Let T> be an OSNAP distribution [NN12j over R mxn with column sparsity s. The 
class of OSNAP distributions includes both of the sparse JL distributions in |KN12j . and more 
generally an OSNAP distribution is characterized by the following three properties where A is a 
random matrix drawn from T>: 

• All entries of A are in {0, 1/y/s, — 1/y/s}. We write Aij = Sijcn,j/>/s where Sij is an indicator 
random variable for the event Aij 7^ 0, and the aij are independent uniform ±1 r.v.'s. 

• For any j G [n], Y^ILi = s with probability 1. 

• For any S C [m] x [n], ^U(ij)eS 6 iJ ^ (s/ m )' 5 '- 

Given a set of vectors V C W 1 , what is the tradeoff between the number of rows m and the column 
sparsity s required for a random matrix A drawn from an OSNAP distribution to preserve all £2 
norms of vectors v £ V up to 1 ± e simultaneously, with positive probability, as a function of the 
geometry of VI We are motivated to ask this question by a result of [KM05] , which states that for 
a set of vectors V C M n all of unit £2 norm, a matrix with random subgaussian entries preserves 
all £2 norms of vectors in V up to 1 ± e as long as the number of rows m satisfies 

m> Ce~ 2 - U gS u V \(g,x)\) , (7) 

where g € M. n has independent Gaussian entries of mean and variance 1. The bound on m 
in |KM05j is actually stated as C£ _2 (72(V, || • H2)) 2 where 72 is the 72 functional, but this is 
equivalent to Eq. ([7]) up to a constant factor; see |Tal05j for details. Note Eq. ([7]) easily implies the 
m = 0(d/e 2 ) bound for OSE's by letting V be the unit sphere in any d-dimensional subspace, and 
also implies m = 0(5^ 2 klog(n/k)) suffices for RIP matrices by letting V be the set of all fc-sparse 
vectors of unit norm. 

Note that the resolution of this question will not just be in terms of the 72 functional. In 
particular, for constant 5 k we see that m,s = ©((72 (^)) 2 ) is necessary and sufficient when V is the 
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set of all unit norm fc-sparse vectors. Even increasing m to @((72(^0) 2+7 ) does not decrease the 
lower bound on s by much. Meanwhile for V a unit sphere of a d-dimensional subspace, we can 
simultaneously have m = 0((72(V)) 2+7 /e 2 ), and s = 0(l/e) not depending on 72 (V) at all. 



References 



[AC09] Nir Ailon and Bernard Chazelle. The Fast Johnson-Lindenstrauss transform and 

approximate nearest neighbors. SIAM J. Comput., 39(l):302-322, 2009. 

[Ach03] Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss 

with binary coins. J. Comput. Syst. Sci., 66(4):671-687, 2003. 

[AHK06] Sanjeev Arora, Elad Hazan, and Satyen Kale. A fast random sampling algorithm for 
sparsifying matrices. In Proceedings of the 10th International Workshop on Random- 
ization and Computation (RANDOM), pages 272-279, 2006. 

[AL09] Nir Ailon and Edo Liberty. Fast dimension reduction using Rademacher series on 

dual BCH codes. Discrete Comput. Geom., 42(4):615-630, 2009. 

[ALII] Nir Ailon and Edo Liberty. Almost optimal unrestricted fast Johnson-Lindenstrauss 

transform. In Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete 
Algorithms (SODA), pages 185-191, 2011. 

[Alo09] Noga Alon. Perturbed identity matrices have high rank: Proof and applications. 

Combinatorics, Probability & Computing, 18(1-2) :3-15, 2009. 

[AV06] Rosa I. Arriaga and Santosh Vempala. An algorithmic theory of learning: Robust 

concepts and random projection. Machine Learning, 63(2): 161-182, 2006. 

[BD08] Thomas Blumensath and Mike E. Davies. Iterative hard thresholding for compressed 

sensing. J. Fourier Anal. AppL, 14:629-654, 2008. 

[BDDW08] Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin. A simple 
proof of the restricted isometry property for random matrices. Constr. Approx., 
28:253-263, 2008. 

[BI09] Radu Berinde and Piotr Indyk. Sequential sparse matching pursuit. In Proceedings of 

the 47th Annual Allerton Conference on Communication, Control, and Computing, 
pages 36-43, 2009. 

[BIPW10] Khanh Do Ba, Piotr Indyk, Eric Price, and David P. Woodruff. Lower bounds for 
sparse recovery. In Proceedings of the 21st Annual ACM-SIAM Symposium on Dis- 
crete Algorithms (SODA), pages 1190-1197, 2010. 

[BIR08] Radu Berinde, Piotr Indyk, and Milan Ruzic. Practical near-optimal sparse recovery 
in the LI norm. In Proceedings of the 46th Annual Allerton Conference on Commu- 
nication, Control, and Computing, pages 198-205, 2008. 



16 



[BOR10] Vladimir Braverman, Rafail Ostrovsky, and Yuval Rabani. Rademacher chaos, 
random Eulerian graphs and the sparse Johnson-Lindenstrauss transform. CoRR, 
abs/1011.2590, 2010. 

[Can08] Emmanuel J. Candes. The restricted isometry property and its implications for com- 
pressed sensing. C. R. Acad. Sci. Paris, 346:589-592, 2008. 

[ChalO] Venkat B. Chandar. Sparse Graph Codes for Compression, Sensing, and Secrecy. 
PhD thesis, Massachusetts Institute of Technology, 2010. 

[CRT06a] Emmanuel J. Candes, Justin Romberg, and Terence Tao. Robust uncertainty prin- 
ciples: Exact signal reconstruction from highly incomplete frequency information. 
IEEE Trans. Inf. Theory, (52):489-509, 2006. 

[CRT06b] Emmanuel J. Candes, Justin Romberg, and Terence Tao. Stable signal recovery from 
incomplete and inaccurate measurements. Communications on Pure and Applied 
Mathematics, 59(8), 2006. 

[CT05] Emmanuel J. Candes and Terence Tao. Decoding by linear programming. IEEE 

Trans. Inf. Theory, 51(12):4203-4215, 2005. 

[CT06] Emmanuel J. Candes and Terence Tao. Near-optimal signal recovery from random 

projections: universal encoding strategies? IEEE Trans. Inf. Theory, 52:5406-5425, 
2006. 

[CW12] Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression 
in input sparsity time. CoRR, abs/1207.6365v2, 2012. 

[DG03] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson 

and Lindenstrauss. Random Struct. Algorithms, 22(l):60-65, 2003. 

[DKS10] Anirban Dasgupta, Ravi Kumar, and Tamas Sarlos. A sparse Johnson-Lindenstrauss 
transform. In Proceedings of the J^2nd ACM Symposium on Theory of Computing 
(STOC), pages 341-350, 2010. 

[DMIMW12] Petros Drineas, Malik Magdon-Ismail, Michael Mahoney, and David Woodruff. Fast 
approximation of matrix coherence and statistical leverage. In Proceedings of the 29th 
International Conference on Machine learning (ICMI), 2012. 

[Don06] David L. Donoho. Compressed sensing. IEEE Trans. Inf. Theory, 52(4):1289-1306, 
2006. 

[DTD1S12] David L. Donoho, Yaakov Tsaig, Iddo Drori, and Jean luc Starck. Sparse solution of 
underdetermined linear equations by stagewise orthogonal matching pursuit. IEEE 
Trans. Inf. Theory, 58:1094-1121, 2012. 

[FM88] Peter Frankl and Hiroshi Maehara. The Johnson-Lindenstrauss lemma and the 

sphericity of some graphs. I. Comb. Theory. Ser. B, 44(3):355-362, 1988. 

[Foull] Simon Foucart. Hard thresholding pursuit: an algorithm for compressive sensing. 

SI AM J. Numer. Anal, 49(6):2543-2563, 2011. 



17 



[GG84] Andrej Y. Garnaev and Efim D. Gluskin. On the widths of the Euclidean ball. Soviet 

Mathematics Doklady, 30:200-203, 1984. 

[GK09] Rahul Garg and Rohit Khandekar. Gradient descent with sparsification: an iterative 

algorithm for sparse recovery with restricted isometry property. In Proceedings of the 
26th Annual International Conference on Machine Learning (ICML), pages 337-344, 
2009. 

[Gor88] Yehoram Gordon. On Milman's inequality and random subspaces which escape 

through a mesh in W 1 . Geometric Aspects of Functional Analysis, pages 84-106, 
1988. 

[IM98] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing 

the curse of dimensionality. In Proceedings of the 30th ACM Symposium on Theory 
of Computing (STOC), pages 604-613, 1998. 

[IndOl] Piotr Indyk. Algorithmic applications of low-distortion geometric embeddings. In 

Proceedings of the l±2nd Annual Symposium on Foundations of Computer Science 
(FOCS), pages 10-33, 2001. 

[IR08] Piotr Indyk and Milan Ruzic. Near-optimal sparse recovery in the LI norm. In Pro- 

ceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science 
(FOCS), pages 199-207, 2008. 

[JL84] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into 

a Hilbert space. Contemporary Mathematics, 26:189-206, 1984. 

[Kas77] Boris Sergeevich Kasin. The widths of certain finite-dimensional sets and classes of 

smooth functions. Izv. Akad. Nauk SSSR Ser. Mat, 41(2):334-351, 478, 1977. 

[KM05] Bo'az Klartag and Shahar Mendelson. Empirical processes and random projections. 

J. Fund. Anal, 225(l):229-245, 2005. 

[KN10] Daniel M. Kane and Jelani Nelson. A derandomized sparse Johnson-Lindenstrauss 

transform. CoRR, abs/1006.3585, 2010. 

[KN12] Daniel M. Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. In 

SODA, pages 1195-1206, 2012. 

[KW11] Felix Krahmer and Rachel Ward. New and improved Johnson-Lindenstrauss embed- 
dings via the Restricted Isometry Property. SIAM J. Math. Anal, 43(3):1269-1281, 
2011. 

[LDP07] Michael Lustig, David Donoho, and John M. Pauly. Sparse MRI: The application 
of compressed sensing for rapid MR Imaging. Magnetic Resonance in Medicine, 
58:1182-1195, 2007. 

[Mat08] Jiri Matousek. On variants of the Johnson-Lindenstrauss lemma. Random Struct. 
Algorithms, 33(2): 142-156, 2008. 



18 



[MM12] Xiangrui Meng and Michael W. Mahoney. Low-distortion subspace embed- 
dings in input-sparsity time and applications to robust linear regression. CoRR, 
abs/1210.3135, 2012. 

Gary L. Miller and Richard Peng. Iteratives approaches to row sampling. Manuscript, 
2012. 

Jelani Nelson and Huy L. Nguyen. OSNAP: Faster numerical linear algebra algo- 
rithms via sparser subspace embeddings. Manuscript, 2012. 

Deanna Needell and Joel A. Tropp. CoSaMP: Iterative signal recovery from incom- 
plete and inaccurate samples. Appl. Comput. Harmon. Anal, 26:301-332, 2009. 

Deanna Needell and Roman Vershynin. Uniform uncertainty principle and signal 
recovery via regularized orthogonal matching pursuit. Foundations of Computational 
Mathematics, 9(3):317-334, 2009. 

Deanna Needell and Roman Vershynin. Signal recovery from inaccurate and incom- 
plete measurements via regularized orthogonal matching pursuit. IEEE Journal of 
Selected Topics in Signal Processing, 4:310-316, 2010. 

Tamas Sarlos. Improved approximation algorithms for large matrices via random 
projections. In Proceedings of the 47th Annual IEEE Symposium on Foundations of 
Computer Science (FOCS), pages 143-152, 2006. 

Michel Talagrand. The generic chaining: upper and lower bounds of stochastic pro- 
cesses. Springer Verlag, 2005. 

Joel A. Tropp and Anna C. Gilbert. Signal recovery from random measurements via 
orthogonal matching pursuit. IEEE Trans. Inf. Theory, 53(12):4655-4666, 2007. 

Joel A. Tropp. Improved analysis of the subsampled randomized Hadamard trans- 
form. Adv. Adapt. Data Anal., Special Issue on Sparse Representation of Data and 
Images, 3(1-2):115-126, 2011. 

[Vem04] Santosh Vempala. The random projection method, volume 65 of DIM ACS Series in 
Discrete Mathematics and Theoretical Computer Science. American Mathematical 
Society, 2004. 

[WDL + 09] Kilian Q. Weinberger, Anirban Dasgupta, John Langford, Alexander J. Smola, and 
Josh Attenberg. Feature hashing for large scale multitask learning. In Proceedings 
of the 26th Annual International Conference on Machine Learning (ICML), pages 
1113-1120, 2009. 

[ZWSP08] Yunhong Zhou, Dennis M. Wilkinson, Robert Schreiber, and Rong Pan. Large- 
scale parallel collaborative filtering for the netflix prize. In Proceedings of the 4th 
International Conference on Algorithmic Aspects in Information and Management 
(AAIM), pages 337-348, 2008. 



19 



[MP 12] 
[NN12] 
[NT09] 
[NV09] 

[NV10] 

[Sar06] 

[Tal05] 
[TG07] 
[Troll] 



